Pure pose solution method and system for multi-view camera pose and scene

ABSTRACT

A pure pose solution method and system for a multi-view camera pose and scene are provided. The method includes: a pure rotation recognition (PRR) step: performing PRR on all views, and marking views having a pure rotation abnormality, to obtain marked views and non-marked views; a global translation linear (GTL) calculation step: selecting one of the non-marked views as a reference view, constructing a constraint t r =0, constructing a GTL constraint, solving a global translation (I), reconstructing a global translation of the marked views according to t r  and (I), and screening out a correct solution of the global translation; and a structure analytical reconstruction (SAR) step: performing analytical reconstruction on coordinates of all 3D points according to a correct solution of a global pose. The method and system can greatly improve the computational efficiency and robustness of the multi-view camera pose and scene structure reconstruction.

CROSS REFERENCE TO THE RELATED APPLICATIONS

This application is the national phase entry of International Application No. PCT/CN2019/130316, filed on Dec. 31, 2019, which is based upon and claims priority to Chinese Patent Application No. 201911267354.1, filed on Dec. 11, 2019, the entire contents of which are incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to the field of computer vision, and in particular, to a pure pose solution method and system for a multi-view camera pose and scene.

BACKGROUND

Reconstruction of camera pose and scene structure has always been the core of reconstruction structure from motion in computer vision. In the conventional multi-view geometric description, the reconstruction of camera pose and scene structure requires initialization of global parameters and bundle adjustment (BA). On the one hand, the initialization of global parameters provides initial values for the BA and mainly includes initialization of global attitude, global translation, and three-dimensional (3D) scene point coordinates, where the difficulty lies in the initialization of global translation. The conventional global translation method generally takes a relative translation of two views as an input and optimizes the global translation by minimizing an algebraic error. Abnormalities may occur in cases such as camera pure rotation or co-linear motion in the conventional global translation method. On the other hand, the objective of the BA is to minimize a re-projection error. A parameter space includes 3D scene point coordinates, a pose parameter, a camera parameter, and others. In the case of m 3D scene points and n images, the space dimensionality of parameters to be optimized is 3m+6n. There are generally a large number of 3D scene points, resulting in high space dimensionality of the parameters to be optimized.

A patent (Patent Publication No. CN106408653A) discloses a real-time robust bundle adjustment method for large-scale 3D reconstruction. The current mainstream BA method is a nonlinear optimization algorithm considering the sparsity of a parametric Jacobian matrix, but the method still fails to meet the real-time and robustness requirements in large-scale scenes.

SUMMARY

To solve the defects in the prior art, an objective of the present disclosure is to provide a pure pose solution method and system for a multi-view camera pose and scene.

The present disclosure provides a pure pose solution method for a multi-view camera pose and scene, where the method uses initial attitude values of views as an input and includes:

a pure rotation recognition (PRR) step: performing PRR on all views and marking views having a pure rotation abnormality to obtain marked views and non-marked views;

a global translation linear (GTL) calculation step: selecting one of the non-marked views as a reference view, constructing a constraint t_(r)=0, constructing a GTL constraint, solving a global translation i, reconstructing a global translation of the marked views according to t_(r) and i, and screening out a correct solution of the global translation; and

a structure analytical reconstruction (SAR) step: performing analytical reconstruction on coordinates of all 3D points according to a correct solution of a global pose.

Preferably, the PRR step includes the following steps:

step 1: for a view i (1≤i≤N) and a view j∈V_(i), calculating θ_(i,j)=∥[X_(j)]_(x)R_(i,j)X_(i)∥ by using all image matching point pairs (X_(i),X_(j)) and a relative attitude R_(i,j) of dual views (i,j) and constructing sets

${{\Theta_{i,j}{and}\Theta_{i}} = {\bigcup\limits_{j \in V_{i}}\Theta_{i,j}}},$

where a proportion of elements in Θ_(i) that are greater than δ₁ is denoted by γ_(i);

step 2: if γ1<δ₂, marking the view i as a pure rotation abnormality view, recording a mean value of elements in the set Θ_(i,j) as θ _(i,j), letting

${l = {\underset{j \in V_{i}}{argmin}\left\{ {\overset{¯}{\theta}}_{i,j} \right\}}},$

and constructing a constraint t_(i)=t_(l);

where if a 3D point X^(W)=(x^(W), y^(W), z^(W))^(T) is visible in n (≤N) views, for i=1, 2 . . . , n, V_(i) is a set composed of all co-views of the view i; X_(i) and X_(j) represent normalized image coordinates of a point X^(W) on the view i and the view j, respectively; δ₁ and δ₂ are specified thresholds; R_(i) and t_(i) represent a global attitude and a global translation of the view i, respectively; R_(i,j) (=R_(j)R_(i) ^(T)) and t_(i,j) represent a relative attitude and a relative translation of the dual views (i,j), respectively; and [X_(j)]_(x) represents an antisymmetric matrix formed by vectors X_(j); and

step 3: repeating step 1 to step 2 for all the views.

Preferably, the GTL calculation step includes the following steps:

step 1: for a current 3D point, selecting views

${\left( {ϛ,\eta} \right) = {\underset{{1 \leq i},{j \leq n}}{argmax}\left\{ \theta_{i,j} \right\}}},$

where

is a left baseline view, and η is a right baseline view;

step 2: for all the non-marked views (excluding the reference view), constructing a GTL constraint according to the form of Bt_(η)+Ct_(i)+

=0;

where normalized image coordinates of a 3D point x^(W) on the view i include X_(i)˜

_(,i)

a^(T)

_(,η)+

_(,η)

_(,i)

Y_(i), ˜ represents an equation under homogeneous coordinates, a^(T)=−([X_(η)]_(x)

_(,η)

)^(T)[X_(η)]_(x), and the superscript T represents transposition of a matrix or vector. In order to solve the global translation linearly, different target function forms are defined. For example, (I₃−X_(i)e₃ ^(T))Y_(i)=0 and [X_(i)]_(x)Y_(i)=0; I₃ represents a 3D unit matrix and e₃ represents a third-column vector e₃=(0,0,1)^(T) of the unit matrix. In addition, because the relative translation t_(i,j) has different forms with respect to the global translation, for example, t_(i,j)=R_(j)(t_(i)−t_(j)) and t_(i,j)=t_(j)−R_(i,j)t_(i), matrices B, C, and D also have different forms correspondingly:

(1) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=[X_(i)]_(x)

_(,i)

a^(T)R_(,η), C=[X_(i)]_(x)

_(,η)R_(i), D=−(B+C);

(2) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=(I₃−X_(i)e₃ ^(T))

_(,i)

_(,i)a^(T)R_(η), C=(I₃−X_(i)e₃ ^(T))

_(,η)R_(i), D=−(B+C);

(3) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=[X_(i)]_(x)

_(,i)

a^(T), C=[X_(i)]_(x)

_(,η), D=−(B

_(,η)+C

_(,i)); and

(4) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T), C=(I₃−X_(i)e₃ ^(T))

_(,η), D=−(B

_(,η)+C

_(,i));

step 3: repeating step 1 to step 2 for other 3D points, constructing a linear equation, and solving the global translation i;

step 4: reconstructing the global translation of the marked views according to t_(i)=t_(l) by using {circumflex over (t)} and t_(r); and

step 5: screening out the correct solution of the global translation t according to a^(T)

_(,η)≥0.

Preferably, an optional camera pose optimization step is added between the GTL calculation step and the SAR step:

expressing image homogeneous coordinates f_(i) of the 3D point X^(W) on the view i as follows:

f _(i)˜

_(,i)

+

_(,i)

where ˜ represents an equation under homogeneous coordinates,

=∥[X_(η)]_(x)

_(,η)∥/

_(,η), and a re-projection error is defined as follows:

e _(i) =f _(i) /e ₃ ^(T) f _(i) −{tilde over (f)} _(i)

where {tilde over (f)}_(i) represents image coordinates of a 3D point on the view i and a third element is 1. For all views of the 3D point, a re-projection error vector ε is formed. For all 3D points, an error vector Σ is formed. A target function of global pose optimization is described as arg min Σ^(T)Σ, and an optimization solution of the global pose is calculated accordingly. It should be noted that the camera pose optimization step may be replaced with another optimization algorithm, such as a classic BA algorithm; in this case, the 3D scene point coordinates may adopt an output result of the classic BA algorithm or may be obtained by using the following SAR step.

Preferably, the SAR step includes:

performing analytical and weighted reconstruction on a multi-view 3D scene structure according to a camera pose;

for a current 3D point, calculating a depth of field in the left baseline view

${\overset{\hat{}}{z}}_{ϛ}^{W} = {\underset{j \neq ϛ}{\sum\limits_{1 \leq j \leq n}}{\omega_{ϛ,j}d_{ϛ}^{({ϛ,j})}}}$

calculating a depth of field in the right baseline view:

${\overset{\hat{}}{z}}_{\eta}^{W} = {\underset{j \neq \eta}{\sum\limits_{1 \leq j \leq n}}{\omega_{j,\eta}d_{\eta}^{({j,\eta})}}}$

where d_(η) ^((j,η))=∥[R_(j,η)X_(j)]_(x)t_(j,η)∥/θ_(j,η),

and ω_(j,η) represent weighting coefficients. For example, in analytical reconstruction of a 3D point based on the depth of field in the left baseline view, it is specified that

${\omega_{ϛ,j} = {\theta_{ϛ,j}/{\underset{j \neq ϛ}{\sum\limits_{1 \leq j \leqq n}}\theta_{ϛ,j}}}},$

and in this case, coordinates of the current 3D feature point are as follows:

X ^(W)=

+

Coordinates of all the 3D points can be obtained through analytical reconstruction. Similarly, the coordinates of the 3D points can be obtained through analytical reconstruction based on the depth of field in the right baseline view. An arithmetic mean of the foregoing two categories of coordinate values of the 3D points can be calculated.

The present disclosure provides a pure pose solution system for a multi-view camera pose and scene, including:

a PRR module configured to perform PRR on all views, and mark views having a pure rotation abnormality to obtain marked views and non-marked views;

a GTL calculation module configured to select one of the non-marked views as a reference view, construct a constraint t_(r)=0, construct a GTL constraint, solve a global translation {circumflex over (t)}, reconstruct a global translation of the marked views according to t_(r) and {circumflex over (t)}, and screen out a correct solution of the global translation; and

an SAR module configured to perform analytical reconstruction on coordinates of all 3D points according to a correct solution of a global pose.

Preferably, the PRR module includes the following modules:

a module M11 configured to: for a view i (1≤i≤N) and a view j∈V_(i), calculate θ_(i,j)=∥[X_(j)]_(x)R_(i,j)X_(i)∥ by using all image matching point pairs (X_(i),X_(j)) and a relative attitude R_(i,j) of dual views (i,j) and construct sets Θ_(i,j) and

${\Theta_{i} = {\bigcup\limits_{j \in V_{i}}\Theta_{i,j}}},$

where a proportion of elements in Θ_(i) that are greater than δ₁ is denoted by γ_(i);

a module M12 configured to: if γ_(i)<δ₂, mark the view i as a pure rotation abnormality view, record a mean value of elements in the set Θ_(i,j) as θ _(i,j), let

${l = {\underset{j \in V_{i}}{argmin}\left\{ {\overset{¯}{\theta}}_{i,j} \right\}}},$

and construct a constraint t_(i)=t_(l);

where if a 3D point X^(W)=(x^(W),y^(W),z^(W))^(T) is visible in n (≤N) views, for i=1, 2 . . . , n, V_(i) is a set composed of all co-views of the view i; X_(i) and X_(j) represent normalized image coordinates of a point X^(W) on the view i and the view j, respectively; δ₁ and δ₂ are specified thresholds; R_(i) and t_(i) represent a global attitude and a global translation of the view i, respectively; R_(i,j) (=R_(j)R_(i) ^(T)) and t_(i,j) represent a relative attitude and a relative translation of the dual views (i,j), respectively; and [X_(j)]_(x) represents an antisymmetric matrix formed by vectors X_(j); and

a module M13 configured to repeat operations of the module M11 to the module M12 for all the views.

Preferably, the GTL calculation module includes:

a module M21 configured to: for a current 3D point, select views

${\left( {ϛ,\eta} \right) = {\underset{{1 \leq i},{j \leq n}}{argmax}\left\{ \theta_{i,j} \right\}}},$

where

is a left baseline view, and η is a right baseline view;

a module M22 configured to: for all the non-marked views, construct a GTL constraint according to the form of Bt_(η)+Ct_(i)+D

=0;

where normalized image coordinates of a 3D point X^(W) on the view i include X_(i)˜

_(,i)

a^(T)

_(,η)+

_(,η)

_(,i)□Y_(i), ˜ represents an equation under homogeneous coordinates, a^(T)=−([X_(η)]_(x)

_(,η)

)^(T)[X_(η)]_(x),and the superscript T represents transposition of a matrix or vector;

In addition, because the relative translation t_(i,j) has different forms with respect to the global translation, matrices B, C, and D also have different forms correspondingly:

(1) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=[X_(i)]_(x)

_(,i)

a^(T)R_(η), C=[X_(i)]_(x)

_(,η)R_(i), D=−(B+C);

(2) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T)R_(η), C=(I₃−X_(i)e₃ ^(T))

_(,η)R_(i), D=−(B+C);

(3) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=[X_(i)]_(x)

_(,i)

a^(T), C=[X_(i)]_(x)

_(,η), D=−(B

_(,η)+C

_(,i)); and

(4) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T), C=(I₃−X_(i)e₃ ^(T))

_(,η), D=−(B

_(,η)+C

_(,i));

a module M23 configured to repeat operations of the module M21 to the module M22 for other 3D points, construct a linear equation, and solve the global translation {circumflex over (t)};

a module M24 configured to reconstruct the global translation of the marked views according to t_(i)=t_(l) by using {circumflex over (t)} and t_(r); and

a module M25 configured to screen out the correct solution of the global translation t according to a^(T)

_(,η)≥0.

Preferably, the system further includes a camera pose optimization module configured to: express image homogeneous coordinates f_(i) of the 3D point X^(W) on the view i as follows:

f _(i)˜

_(,i)

+

_(i)

where ˜ represents an equation under homogeneous coordinates,

=∥[X_(η)]_(x)

_(,η)∥/

_(,η), and a re-projection error is defined as follows:

$\varepsilon_{i} = {\frac{f_{i}}{e_{3}^{T}f_{i}} - {\overset{\sim}{f}}_{i}}$

where e₃ ^(T)=(0,0,1), {tilde over (f)}_(i) represents image coordinates of a 3D point on the view i and a third element is 1. For all views of the 3D point, a re-projection error vector ε is formed. For all 3D points, an error vector Σ is formed. A target function of global pose optimization is described as arg min Σ^(T)Σ, and an optimization solution of the global pose is calculated accordingly.

Alternatively, the camera pose optimization step is replaced with a classic BA algorithm; in this case, the 3D scene point coordinates adopt an output result of the classic BA algorithm or are obtained by using the SAR step.

Preferably, the SAR module is configured to:

perform analytical and weighted reconstruction on a multi-view 3D scene structure according to a camera pose;

for a current 3D point, calculate a depth of field in the left baseline view

:

${\overset{\hat{}}{z}}_{ϛ}^{W} = {\underset{j \neq ϛ}{\sum\limits_{1 \leq j \leq n}}{\omega_{ϛ,j}d_{ϛ}^{({ϛ,j})}}}$

calculate a depth of field in the right baseline view:

${\overset{\hat{}}{z}}_{\eta}^{W} = {\underset{j \neq \eta}{\sum\limits_{1 \leq j \leq n}}{\omega_{j,\eta}d_{\eta}^{({j,\eta})}}}$

where d_(η) ^((j,η))=∥[R_(j,η)X_(j)]_(x)t_(j,η)∥/θ_(j,η),

_(,j) and ω_(j,η) represent weighting coefficients; and

perform analytical reconstruction to obtain coordinates of all the 3D points accordingly; or perform analytical reconstruction to obtain coordinates of the 3D points by using the depth of field of the right baseline view, or calculate an arithmetic mean of the foregoing two categories of coordinate values of the 3D points.

Compared with the prior art, the present disclosure has the following beneficial effects.

The present disclosure solves the bottleneck problem of traditional initial value and optimization methods and can substantially improve the robustness and computational speed of the camera pose and scene structure reconstruction.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features, objectives, and advantages of the present disclosure will become more apparent by reading the detailed description of non-limiting embodiments with reference to the following accompanying drawings.

FIGURE is a flowchart of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

The present disclosure is described in detail below with reference to specific embodiments. The following embodiments will help those skilled in the art to further understand the present disclosure, but they do not limit the present disclosure in any way. It should be noted that several variations and improvements can also be made by a person of ordinary skill in the art without departing from the ideas of the present disclosure. These all fall within the protection scope of the present disclosure.

As shown in FIGURE, the present disclosure provides a pure pose solution method for a multi-view camera pose and scene, where the method uses initial attitude values of views as an input and includes the following steps:

PRR step: Perform PRR on all views and mark views having a pure rotation abnormality to obtain marked views and non-marked views.

GTL calculation step: Select one of the non-marked views as a reference view, construct a constraint t_(r)=0, construct a GTL constraint, solve a global translation {circumflex over (t)}, reconstruct a global translation of the marked views according to t_(r) and {circumflex over (t)}, and screen out a correct solution of the global translation.

SAR step: Perform analytical reconstruction on coordinates of all 3D points according to a correct solution of a global pose.

The PRR step includes the following steps:

Step 1: For a view i (1≤i≤N) and a view j∈V_(i), calculate θ_(i,j)=∥[X_(j)]_(x)R_(i,j)X_(i)∥ by using all image matching point pairs (X_(i),X_(j)) and a relative attitude R_(i,j) of dual views (i,j) and construct sets Θ_(i,j) and

${\Theta_{i} = {\bigcup\limits_{j \in V_{i}}\Theta_{i,j}}},$

where a proportion of elements in Θ_(i) that are greater than δ₁ is denoted by γ_(i).

Step 2: If γ_(i)<δ₂, mark the view i as a pure rotation abnormality view, record a mean value of elements in the set Θ_(i,j) as θ _(i,j), letting

${l = {\underset{j \in V_{i}}{argmin}\left\{ {\overset{¯}{\theta}}_{i,j} \right\}}},$

and construct a constraint t_(i)=t_(l).

If a 3D point X^(W)=(x^(W), y^(W),z^(W))^(T) is visible inn (≤N) views, for i=1, 2 . . . , n, V_(i) is a set composed of all co-views of the view i; X_(i) and X_(j) represent normalized image coordinates of a point X^(W) on the view i and the view j, respectively; δ₁ and δ₂ are specified thresholds; R_(i) and t_(i) represent a global attitude and a global translation of the view i, respectively; R_(i,j) (=R_(j)R_(i) ^(T)) and t_(i,j) represent a relative attitude and a relative translation of the dual views (i,j), respectively; and [X_(j)]_(x) represents an antisymmetric matrix formed by vectors X_(j).

Step 3: Repeat step 1 to step 2 for all the views.

The GTL calculation step includes the following steps:

Step 1: For a current 3D point, select views

${\left( {,\eta} \right) = {\underset{{1 \leq i},{j \leq n}}{argmax}\left\{ \theta_{i,j} \right\}}},$

where

is a left baseline view and η is a right baseline view.

Step 2: For all the non-marked views (excluding the reference view), construct a GTL constraint according to the form of Bt_(η)+Ct_(i)+D

=0.

Normalized image coordinates of a 3D point X^(W) on the view i include X_(i)˜

_(,i)

a^(T)

_(,η)+

_(,η)

_(,i)

Y_(i), ˜ represents an equation under homogeneous coordinates, a^(T)=−([X_(η)]_(x)

_(,η)

)^(T)[X_(η)]_(x), and the superscript T represents transposition of a matrix or vector. In order to solve the global translation linearly, different target function forms are defined, for example, (I₃−X_(i)e₃ ^(T))Y_(i)=0 and [X_(i)]_(x)Y_(i)=0. I₃ represents a 3D unit matrix, and e₃ represents a third-column vector e₃=(0,0,1)^(T) of the unit matrix. In addition, because the relative translation t_(i,j) has different forms with respect to the global translation, for example, t_(i,j)=R_(j)(t_(i)−t_(j)) and t_(i,j)=t_(j)−R_(i,j) t_(i), matrices B, C, and D of also have different forms:

(1) for the target function [X_(i)]_(x) Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=[X_(i)]_(x)

_(,i)

a^(T)R_(η), C=[X_(i)]_(x)

_(,η)R_(i), D=−(B+C);

(2) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T)R_(η), C=(I₃−X_(i)e₃ ^(T))

_(,η)R_(i), D=−(B+C);

(3) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=[X_(i)]_(x)

_(,i)

a^(T), C=[X_(i)]_(x)

_(,η), D=−(B

_(,η)+C

_(,i)); and

(4) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T), C=(I₃−X_(i)e₃ ^(T))

_(,η), D=−(B

_(,η)+C

_(,i)).

Step 3: Repeat step 1 to step 2 for other 3D points, construct a linear equation, and solve the global translation {circumflex over (t)}.

Step 4: Reconstruct the global translation of the marked views according to t_(i)=t_(l) by using {circumflex over (t)} and t_(r).

Step 5: Screen out the correct solution of the global translation t according to

≥0.

An optional camera pose optimization step is added between the GTL calculation step and the SAR step:

Express image homogeneous coordinates f_(i) of the 3D point X^(W) on the view i as follows:

f _(i)˜

_(,i)

+

_(,i)

where ˜ represents an equation under homogeneous coordinates,

=∥[X_(η)]_(x)

_(,η)∥/

_(,η),and a re-projection error is defined as follows:

$\varepsilon_{i} = {\frac{f_{i}}{e_{3}^{T}f_{i}} - \overset{\sim}{f_{i}}}$

where {tilde over (f)}_(i) represents image coordinates of a 3D point on the view i and a third element is 1. For all views of the 3D point, a re-projection error vector ε is formed. For all 3D points, an error vector Σ is formed. A target function of global pose optimization is described as arg min Σ^(T)Σ, and an optimization solution of the global pose is calculated accordingly. It should be noted that the camera pose optimization step may be replaced with another optimization algorithm, such as a classic BA algorithm; in this case, the 3D scene point coordinates may adopt an output result of the classic BA algorithm or may be obtained by using SAR step.

The SAR step includes:

performing analytical and weighted reconstruction on a multi-view 3D scene structure according to a camera pose.

For a current 3D point, a depth of field in the left baseline view

is calculated as follows:

${\overset{\hat{}}{z}}_{ϛ}^{W} = {\sum\limits_{\substack{1 \leq j \leq n \\ j \neq ϛ}}{\omega_{ϛ,j}d_{ϛ}^{({ϛ,j})}}}$

A depth of field in the right baseline view is calculated as follows:

${\hat{z}}_{\eta}^{W} = {\underset{j \neq \eta}{\sum\limits_{1 \leq j \leq n}}{\omega_{j,\eta}d_{\eta}^{({j,\eta})}}}$

where d_(η) ^((j,η))=∥[R_(j,η)X_(j)]_(x)t_(j,η)∥/θ_(j,η),

_(,j) and ω_(j,η) represent weighting coefficients. For example, in analytical reconstruction of a 3D point based on the depth of field in the left baseline view, it is specified that

${\omega_{\varsigma,j} = {\theta_{\varsigma,j}/{\underset{j \neq \varsigma}{\sum\limits_{1 \leq j \leq n}}\theta_{\varsigma,j}}}},$

and in this case, coordinates of the current 3D feature point are as follows:

X ^(W)=

Coordinates of all the 3D points can be obtained through analytical reconstruction. Similarly, the coordinates of the 3D points can be obtained through analytical reconstruction based on the depth of field in the right baseline view. An arithmetic mean of the foregoing two categories of coordinate values of the 3D points can be calculated.

Based on the foregoing pure pose solution method for a multi-view camera pose and scene, the present disclosure further provides a pure pose solution system for a multi-view camera pose and scene, including:

a PRR module configured to perform PRR on all views, and mark views having a pure rotation abnormality, to obtain marked views and non-marked views;

a GTL calculation module configured to select one of the non-marked views as a reference view, construct a constraint t_(r)=0, construct a GTL constraint, solve a global translation {circumflex over (t)}, reconstruct a global translation of the marked views according to t_(r) and {circumflex over (t)}, and screen out a correct solution of the global translation; and

an SAR module configured to perform analytical reconstruction on coordinates of all 3D points according to a correct solution of a global pose.

Those skilled in the art are aware that in addition to being realized by using pure computer-readable program code, the system and each apparatus, module, and unit thereof provided in the present disclosure can realize a same program in a form of a logic gate, a switch, an application-specific integrated circuit, a programmable logic controller, or an embedded microcontroller by performing logic programming on the method steps. Therefore, the system and each apparatus, module, and unit thereof provided in the present disclosure can be regarded as a hardware component. The apparatus, module, and unit included therein for realizing various functions can also be regarded as a structure in the hardware component; the apparatus, module and unit for realizing the functions can also be regarded as a software program for implementing the method or a structure in the hardware component.

The specific embodiments of the present disclosure are described above. It should be understood that the present disclosure is not limited to the above specific implementations, and a person skilled in the art can make various variations or modifications within the scope of the claims without affecting the essence of the present disclosure. The embodiments in the present disclosure and features in the embodiments may be arbitrarily combined with each other in a non-conflicting manner. 

What is claimed is:
 1. A pure pose solution method for a multi-view camera pose and scene, comprising: a pure rotation recognition (PRR) step, wherein the PRR step comprises performing a PRR on views, and marking views having a pure rotation abnormality of the views to obtain marked views and non-marked views; a global translation linear (GTL) calculation step, wherein the GTL calculation step comprises selecting one of the non-marked views as a reference view, constructing a constraint t_(r)=0, constructing a GTL constraint, solving a global translation {circumflex over (t)}, reconstructing a global translation of the marked views according to t_(r) and {circumflex over (t)}, and screening out a correct solution of the global translation; and a structure analytical reconstruction (SAR) step, wherein the SAR step comprises performing an analytical reconstruction on coordinates of 3D points according to a correct solution of a global pose.
 2. The pure pose solution method according to claim 1, wherein the PRR step further comprises: step 11: for a view i (1≤i≤N) and a view j (j∈V_(i)), calculating θ_(i,j)=∥[X_(j)]_(x)R_(i,j)X_(i)∥ by using image matching point pairs (X_(i),X_(j)) and a relative attitude R_(i,j) of dual views (i,j), and constructing a set Θ_(i,j) and a set ${\Theta_{i} = {\bigcup\limits_{j \in V_{i}}\Theta_{i,j}}},$  wherein a proportion of elements, greater than δ₁, in Θ_(i) is denoted by γ_(i); step 12: when γ_(i)<δ₂, marking the view i as a pure rotation abnormality view, recording a mean value of elements in the set Θ_(i,j) as θ _(i,j), letting ${l = {\underset{j \in V_{i}}{\arg\min}\left\{ {\overset{\_}{\theta}}_{i,j} \right\}}},$  and constructing a constraint t_(i)=t_(l); wherein when a 3D point X^(W)=(x^(W),y^(W),z^(W))^(T) is visible in n (≤N) views, for i=1, 2 . . . , n, V_(i) is a set configured with co-views of the view i; X_(i) and X_(j) represent a normalized image coordinate of the 3D point X^(W) on the view i and a normalized image coordinate of the 3D point X^(W) on the view j, respectively; δ₁ and δ₂ are specified thresholds; R_(i) and t_(i) represent a global attitude of the view i and a global translation of the view i, respectively; R_(i,j) (=R_(j)R_(i) ^(T)) and t_(i,j) represent a relative attitude of the dual views (i,j) and a relative translation of the dual views (i,j), respectively; and [X_(j)]_(x) represents an antisymmetric matrix formed by vectors X_(j); and step 13: repeating step 11 to step 12 for the views.
 3. The pure pose solution method according to claim 2, wherein the GTL calculation step comprises: step 21: for a current 3D point, selecting views ${\left( {\varsigma,\eta} \right) = {\underset{{1 \leq i},{j \leq n}}{\arg\max}\left\{ \theta_{i,j} \right\}}},$  wherein

is a left baseline view, and η is a right baseline view; step 22: for the non-marked views, constructing the GTL constraint according to a form of Bt_(η)+Ct_(i)+D

=0; wherein the normalized image coordinate of the 3D point X^(W) on the view i comprise X_(i)˜

a^(T)

_(η)+

_(,η)

_(i)

Y_(i), ˜ represents an equation under homogeneous coordinates, a^(T)=−([X_(η)]_(x)

_(,η)

)^(T)[X_(η)]_(x), and a superscript T represents a transposition of a matrix or a transposition of a vector; and different target function forms are defined to solve the global translation linearly; wherein the relative translation t_(i,j) has different forms corresponding to the global translation, a matrix B, a matrix C, and a matrix D have different forms: (1) for a target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=[X_(i)]_(x)

_(,i)

a^(T)R_(η), C=[X_(i)]_(x)

_(,η)R_(i), D=−(B+C); (2) for a target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T)R_(η), C=(I₃−X_(i)e₃ ^(T))

_(,η)R_(i), D=−(B+C); (3) for the target function [X_(i)]Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=[X_(i)]_(x)

_(,i)

a^(T), C=[X_(i)]_(x)

_(,η), D=−(B

_(,η)+C

_(,i)); and (4) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T), C=(I₃−X_(i)e₃ ^(T))

, D=−B

_(,η)+C

_(,i)); step 23: repeating step 21 to step 22 for other 3D points, constructing a linear equation, and solving the global translation {circumflex over (t)}; step 24: reconstructing the global translation of the marked views according to t_(i)=t_(l) by using {circumflex over (t)} and t_(r); and step 25: screening out the correct solution of the global translation t according to

≥0.
 4. The pure pose solution method according to claim 3, further comprising a camera pose optimization step between the GTL calculation step and the SAR step, wherein the camera pose optimization step comprises: expressing image homogeneous coordinates f_(i) of the 3D point X^(W) on the view i, wherein f _(i)˜

+

_(,i) wherein ˜ represents the equation under homogeneous coordinates,

=∥[X_(η)]_(x)

_(,η)∥/

_(,η), and a re-projection error is defined, wherein $\varepsilon_{i} = {\frac{f_{i}}{e_{3}^{T}f_{i}} - \overset{\sim}{f_{i}}}$ wherein e₃ ^(T)=(0,0,1), {tilde over (f)}_(i) represents image coordinates of the 3D point on the view i and a third element of {tilde over (f)}_(i) is 1; for views of the 3D point, a re-projection error vector ε is formed; for the 3D points, an error vector Σ is formed; a target function of a global pose optimization is described as arg min Σ^(T)Σ, and an optimization solution of the global pose is calculated; and the camera pose optimization step is further replaced with a classic bundle adjustment (BA) algorithm, wherein 3D scene point coordinates are configured by an output result of the classic BA algorithm, or the 3D scene point coordinates are obtained by using the SAR step.
 5. The pure pose solution method according to claim 3, wherein the SAR step further comprises: performing an analytical and weighted reconstruction on a multi-view 3D scene structure according to a camera pose; for the current 3D point, calculating a depth of field in the left baseline view

, wherein ${\hat{z}}_{\varsigma}^{W} = {\underset{j \neq \varsigma}{\sum\limits_{1 \leq j \leq n}}{\omega_{\varsigma,j}d_{\varsigma}^{({\varsigma,j})}}}$ calculating a depth of field in the right baseline view, wherein ${\hat{z}}_{\eta}^{W} = {\underset{j \neq \eta}{\sum\limits_{1 \leq j \leq n}}{\omega_{j,\eta}d_{\eta}^{({j,\eta})}}}$ wherein d_(η) ^((j,η))=∥[R_(j,η)X_(j)]_(x)t_(j,η)∥/θ_(j,η),

_(,j) and ω_(j,η) represent weighting coefficients; and performing the analytical reconstruction to obtain a first category of the coordinates of the 3D points; or performing the analytical reconstruction to obtain a second category of the coordinates of the 3D points by using the depth of field in the right baseline view, or calculating an arithmetic mean of the first and second categories of the coordinate values of the 3D points.
 6. A pure pose solution system for a multi-view camera pose and scene, comprising: a module, wherein the PRR module is configured to perform a PRR on views, and mark views having a pure rotation abnormality of the views to obtain marked views and non-marked views; a GTL calculation module, wherein the GTL calculation module is configured to select one of the non-marked views as a reference view, construct a constraint t_(r)=0, construct a GTL constraint, solve a global translation {circumflex over (t)}, reconstruct a global translation of the marked views according to t_(r) and {circumflex over (t)}, and screen out a correct solution of the global translation; and a SAR module, wherein the SAR module is configured to perform an analytical reconstruction on coordinates of 3D points according to a correct solution of a global pose.
 7. The pure pose solution system according to claim 6, wherein the PRR module further comprises: a module M11, wherein the module M11 is configured to: for a view i (1≤i≤N) and a view j (j∈V_(i)), calculate θ_(i,j)=∥[X_(j)]_(x)R_(i,j)X_(i)∥ by using image matching point pairs (X_(i),X_(j)) and a relative attitude R_(i,j) of dual views (i,j) and construct a set Θ_(i,j) and a set ${\Theta_{i} = {\bigcup\limits_{j \in V_{i}}\Theta_{i,j}}},$  wherein a proportion of elements, greater than δ₁, in Θ_(i) is denoted by γ_(i); a module M12, wherein the module M12 is configured to: when γ₁<δ₂, mark the view i as a pure rotation abnormality view, record a mean value of elements in the set Θ_(i,j) as θ _(i,j), let $l = {\underset{j \in V_{i}}{\arg\min}\left\{ {\overset{\_}{\theta}}_{i,j} \right\}}$  and construct a constraint t_(i)=t_(l); wherein when a 3D point X^(W)=(x^(W),y^(W),z^(W))^(T) is visible in n (≤N) views, for i=1, 2 . . . , n, V_(i) is a set configured with co-views of the view i; X_(i) and X_(j) represent a normalized image coordinate of the 3D point X^(W) on the view i and a normalized image coordinate of the 3D point X^(W) on the view j, respectively; δ₁ and δ₂ are specified thresholds; R_(i) and t_(i) represent a global attitude of the view i and a global translation of the view i, respectively; R_(i,j) (=R_(j)R_(i) ^(T)) and t_(i,j) represent a relative attitude of the dual views (i,j) and a relative translation of the dual views (i,j), respectively; and [X_(j)]_(x) represents an antisymmetric matrix formed by vectors X_(j); and a module M13, wherein the module M13 is configured to repeat operations of the module M11 to the module M12 for the views.
 8. The pure pose solution system according to claim 7, wherein the GTL calculation module comprises: a module M21, wherein the module M21 is configured to: for a current 3D point, select views ${\left( {\varsigma,\eta} \right) = {\underset{{1 \leq i},{j \leq n}}{\arg\max}\left\{ \theta_{i,j} \right\}}},$  wherein

is a left baseline view, and η is a right baseline view; a module M22, wherein the module M22 is configured to: for the non-marked views, construct the GTL constraint according to a form of Bt_(η)+Ct_(i)+

=0; wherein the normalized image coordinate of the 3D point X^(W) on the view i comprise X_(i)˜

_(,i)

a^(T)

_(,η)+

_(,η)

_(,i)

Y_(i) ˜ represents an equation under homogeneous coordinates, a^(T)=−([X_(η)]_(x)

_(,η)

)^(T)[X_(η)]_(x), and a superscript T represents a transposition of a matrix or a transposition of a vector; wherein the relative translation t_(i,j) has different forms corresponding to the global translation, a matrix B, a matrix C, and a matrix D have different forms: (1) for a target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=[X_(i)]_(x)

_(,i)

a^(T)R_(η), C=[X_(i)]_(x)

_(,η)R_(i), D=−(B+C); (2) for a target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=R_(j)(t_(i)−t_(j)): B=(I₃−X_(i)e₃ ^(T))

_(,i)

a^(T)R_(η), C=(I₃−X_(i)e₃ ^(T))

_(,η)R_(i), D=−(B+C); (3) for the target function [X_(i)]_(x)Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=[X_(i)]_(x)

_(,i)

a^(T), C=[X_(i)]_(x)

_(,η), D=−(B

_(,η)+C

_(,i)); and (4) for the target function (I₃−X_(i)e₃ ^(T))Y_(i)=0 and the relative translation t_(i,j)=t_(j)−R_(i,j)t_(i): B=(I₃−X_(i)e₃ ^(T))

_(,j)

a^(T), C=(I₃−X_(i)e₃ ^(T))

_(,η), D=−(B

_(,η)+C

_(,i)); a module M23, wherein the module M23 is configured to repeat operations of the module M21 to the module M22 for other 3D points, construct a linear equation, and solve the global translation {circumflex over (t)}; a module M24, wherein the module M24 is configured to reconstruct the global translation of the marked views according to t_(i)=t_(l) by using {circumflex over (t)} and t_(r); and a module M25, wherein the module M25 is configured to screen out the correct solution of the global translation t according to a^(T)

≥0.
 9. The pure pose solution system according to claim 8, further comprising a camera pose optimization module, wherein the camera pose optimization module is configured to: express image homogeneous coordinates f_(i) of the 3D point X^(W) on the view i, wherein f _(i)˜

_(,i)

+

_(,i) wherein ˜ represents the equation under homogeneous coordinates,

=∥[X_(η)]_(x)

_(,η)∥/

_(,η), and a re-projection error is defined, wherein $\varepsilon_{i} = {\frac{f_{i}}{e_{3}^{T}f_{i}} - \overset{\sim}{f_{i}}}$ wherein e₃ ^(T)=(0,0,1), {tilde over (f)}_(i) represents image coordinates of the 3D point on the view i and a third element of {tilde over (f)}_(i) is 1; for views of the 3D point, a re-projection error vector ε is formed; for the 3D points, an error vector Σ is formed; a target function of a global pose optimization is described as arg min Σ^(T)Σ, and an optimization solution of the global pose is calculated; and the camera pose optimization step is further replaced with a BA algorithm, wherein 3D scene point coordinates are configured by an output result of the classic BA algorithm, or the 3D scene point coordinates are obtained by using the SAR step.
 10. The pure pose solution system according to claim 8, wherein the SAR module is further configured to: perform an analytical and weighted reconstruction on a multi-view 3D scene structure according to a camera pose; for the current 3D point, calculate a depth of field in the left baseline view

, wherein ${\hat{z}}_{\varsigma}^{W} = {\underset{j \neq \varsigma}{\sum\limits_{1 \leq j \leq n}}{\omega_{\varsigma,j}d_{\varsigma}^{({\varsigma,j})}}}$ calculate a depth of field in the right baseline view, wherein ${\hat{z}}_{\eta}^{W} = {\underset{j \neq \eta}{\sum\limits_{1 \leq j \leq n}}{\omega_{j,\eta}d_{\eta}^{({j,\eta})}}}$ wherein d_(η) ^((i,j))=∥[R_(j,η)X_(j)]_(x)t_(j,η)∥/θ_(j,η)

and ω_(j,η) represent weighting coefficients; and perform the analytical reconstruction to obtain a first category of the coordinates of the 3D points; or perform the analytical reconstruction to obtain a second category of the coordinates of the 3D points by using the depth of field in the right baseline view, or calculate an arithmetic mean of the first and second categories of the coordinate values of the 3D points. 